19 - Artificial Intelligence II [ID:52591]
50 von 692 angezeigt

The usual quick recap. We started talking about multilayer perceptrons in comparison

to the single layer perceptrons. There we know what to do with multilayer perceptrons.

We now have to first show what they can even do. The answer is represent all continuous

function with just two layers and once you have three layers you can basically represent

arbitrary computable functions of course given enough nodes. This is not a proper proof obviously

here this is more like an intuition in the sense of if you think of a single layer perceptron as

basically giving you one of those ridges then with two layers you can combine multiple such ridges

that gives you like these kinds of things and then you can like combine those to get

bumps and then you can combine those to like approximate arbitrary function and then of course

in the limit given enough nodes that basically allows you to approximate any arbitrary graph

that you want. If we want to learn multilayer perceptrons we now have to do back propagation

and we've gone over how that works. I extremely nonsensically claimed that you don't need to like

keep the intermediate activations that was obvious bullshit in hindsight. I've also like

checked for my like own implementation of that that I did for fun and obviously I kept them around

and in particular if you consider that there is nodes that are not McCulloch bits units such as

like various kinds of pooling layers and stuff obviously you need to keep like your activations

around to figure out what basically what path in the network your forward pass potentially even

took. So just to emphasize that again obviously we keep the activations and of course the algorithm

that we put here indeed does that right so it stores all of them in some giant vector aj where

j is basically any arbitrary node and we then we use that when we update the weights right here.

Okay so much for back propagation and of course once you have back propagation and

multilayer perceptrons there is little stopping you from learning arbitrary function except for

time resources data all of these kinds of things. So something like the restaurant example now

becomes very much feasible this graph is not very informative because it doesn't actually tell you

the number of nodes this one thankfully does so if we have a multilayer perceptron was just

for hidden units then of course over time we can get there at something like well a hundred examples

in this case we're close to 0.9. The interesting thing here about like the learning curve is that

it's a function of several things so you might think okay it takes a lot longer to converge with

respect to decision trees so maybe we can like improve things by just adding nodes obviously if

you add nodes that also means you take longer to converge because there is more stuff you need to

train and you need more data and all of that. Of course the comparison is also to some extent

unfair because like the restaurant example is very much the kind of example and the kind of

problem that is very much amenable to decision trees in the first place it basically is a decision

tree so obviously decision tree learning would do very well here. One thing again to emphasize is

that the big drawback that neural networks have is that they're utterly incomprehensible in the

sense of once you've trained a neural network you get an output you don't get an explanation

for why the output is the way that it is. Decision trees do that, neural networks do not. So

depending on the application that you have in mind it might very well be the wrong tool to use

just for that fact alone. Obviously that depends on the application if all you're interested in is

this a picture of a cat who cares why like just train a CNN and that's perfectly fine. If you want

to do things such as assessing people in any capacity whatsoever whether that is like credit

cards, loan approvals, job interviews all of that kind of stuff stay the hell away from neural

networks. The likelihood that your data that you use in order to train your stuff is biased a priori

is extremely high and of course this will reflect that bias and once you've trained your network you

have no idea of identifying that there is a bias. Standard task for neural networks is handwritten

and digit recognition standard task in the sense of this standard toy task that you would use to

just like experiment things and if you want to do that I very much recommend that you can download

the MNIST data set, M-N-I-S-T, that consists of exactly images such as that I think in a resolution

of 32 by 32 so they're pretty small and as such pretty well usable to just experiment with things.

And then if you do that and you just build a very small network with let's say two hidden layers or

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:20:57 Min

Aufnahmedatum

2024-06-27

Hochgeladen am

2024-06-27 19:09:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen